Bayesian Model Selection for LISA Pathfinder 
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The main goal of the LISA Pathfinder (LPF) mission is to fully characterize the acceleration noise 
models and to test key technologies for future space-based gravitational- wave observatories similar to 
the LISA/eLISA concept. The Data Analysis (DA) team has developed complex three-dimensional 
models of the LISA Technology Package (LTP) experiment on-board LPF. These models are used 
for simulations, but more importantly, they will be used for parameter estimation purposes during 
flight operations. One of the tasks of the DA team is to identify the physical effects that contribute 
significantly to the properties of the instrument noise. A way of approaching to this problem is to 
recover the essential parameters of the LTP which describe the data. Thus, we want to define the 
simplest model that efficiently explains the observations. To do so, adopting a Bayesian framework, 
one has to estimate the so-called Bayes Factor between two competing models. In our analysis, we 
use three main different methods to estimate it: The Reversible Jump Markov Chain Monte Carlo 
method, the Schwarz criterion, and the Laplace approximation. They are applied first to toy models 
and then, they are verified with full LTP models, where we investigate the correlation of the output 
of these methods with the design of the experiment itself. 



I. INTRODUCTION 



LISA Pathfinder (LPF) [H, [2| is an European Space 
Agency mission that will serve as a technology demon- 
strator for a future space-based gravitational-wave ob- 
servatory like eLISA/NGO [3]. It will be launched to 
fly at the Lagrange point LI and will test key technolo- 
gies of low frequency gravitational wave metrology. The 
LPF mission will prove geodesic motion by monitoring 
the relative acceleration of two test masses in nominally 
free-fall in the frequency band of 1 to 100 mHz. The 
main instrument on-board LPF is the LISA Technology 
Package (LTP), which is an experiment with the aim of 
measuring and characterizing the different contributions 
to the differential acceleration noise between the two test 
masses. This characterization is the main task of the 
Data Analysis team. To that end, dedicated experiments 
are going to be performed in order to estimate the un- 
known parameters of the system. And for that purpose, a 
number of parameter estimation methods and models of 
the LTP have been implemented [^-[y]- The main ques- 
tion that arises is about the suitability of the different 
models implemented, or in simpler terms, which model 
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can describe better the observations of the experiment. 

The motivation to implement an algorithm that will 
help us to classify our models is due to the nature of the 
LTP system. There is the need to quantify the effects 
on the disturbance measurement of the LPF. There has 
been a lot of work trying to understand the instrument 
and the models implemented are based on theoretical or 
experimental measurements from test campaigns [7|, Q . 
Selecting the most probable model is crucial for the anal- 
ysis for two main reasons. First, the most suitable LTP 
experiment that describes the observed physical effects is 
chosen, and secondly, over-fitting situations are avoided, 
together with biased estimation of the parameters of the 
system. 

One could use several criteria and algorithms that clas- 
sify competing models, but in the end, working in a 
Bayesian framework, the main aim is to calculate the 
Bayes Factor^ which is a comparison between the evi- 
dences of the models [9, 10]. The evidence is described 
as the probability of the data y given the model, that 
is, the probability distribution on y that quantifies the 
predictive capabilities of the given model. 

Most of the approaches are based upon the likelihood 
evaluated at the maximum and a penalty for the number 
of parameters in the model consisting in multiplying by 
the so-called Occam Factor. The Occam's razor is the 
principle that states that the simplest hypothesis that 
explains the observations is the most favorable. In our 



case we assume that we have to compare two different 
LTP models; model Mi and a simpler one, M2, over a 
data set y. If Mi is a more complex model, it presents 
more predictive capabilities than M2, which translates to 
more disperse evidence over the data sets y. Thus, in the 
case where the data is compatible with both models, M2 
will turn out to be more probable than Mi , without hav- 
ing to express any subjective dislike for complex models 
[10|. In other words, it is almost certain that the more 
parameters in a model the better the fit of the data, but 
taking it to the extreme, we can imagine a model with as 
many parameters as the data to fit. In that case we have 
over-parameterized the model while the aim is to include 
only the parameters which substantially improve it. 

In this paper we investigate several methods to com- 
pare competing LTP models giving emphasis to Re- 
versible Jump Markov Chain Monte Carlo techniques. 
The plan of the paper is as follows: In Sec. [Ill we make 
a brief introduction to Bayesian methods. We illustrate 
these methods by applying them to a toy model (see Ap- 
pendix [A]). In Sec, mil the LTP experiments and models 
are thoroughly explained. In Sec. [IVlwe investigate sev- 
eral applications of our Bayesian techniques to LTP ex- 
periments. In Sec. |Vl we compare the available methods 
and discuss their output in correlation with the experi- 
ment design. We end with a summary and conclusions 
in section IVTl The codes for the different algorithms are 
integrated into the LTPDA Matlab toolbox [11, 12], that 
comes together with proper documentation and help for 
the user. 



rized by the following equation: 
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where ^ is a model parameter vector, 7r{y\0) is the likeli- 
hood of the parameters over the data-set y, and 7r{0\y) 
and p{0) are the posterior and the prior distributions of 
the parameters respectively. Note that the evidence 7r(y), 
or marginal likelihood^ is often neglected in parameter es- 
timation algorithms, as it serves only as a normalization 
constant: 



7r(^|y) ex 7r{y\e)p{e). 



(2) 



The Metropolis algorithm is one of the MCMC-type 
methods available that is widely used for system identi- 
fication purposes. It is based on the efficient sampling of 
the parameter space by proposing new samples On and 
evaluating the likelihood at each step. By sampling the 
posterior distribution, we assign probabilities to the pa- 
rameters that we want to estimate. For the generation 
of each sample On we use a multivariate Gaussian Prob- 
ability Density Function (PDF) with its covariance ma- 
trix extracted from the computation of the Fisher In- 
formation Matrix (FIM) [4] as its inverse. The FIM is 
calculated numerically or analytically, depending on the 
complexity of the model that we use. It is certain that 
given a large amount of steps in the parameter space, the 
Metropolis algorithm will converge to the set of parame- 
ters ^MAP that maximize the likelihood. 



II. BACKGROUND ON BAYESIAN STATISTICS 



An algorithm that automatically penalizes higher di- 
mensional models and at the same time performs pa- 
rameter estimation is the Reverse Jump Markov Chain 
Monte Carlo (RJMCMC) algorithm. The RJMCMC 
method |13l-[l5| is widely used when dealing with nested 
models, meaning that we need to compare a set of models, 
where simpler models are a subset of a more complicated 
one. This is also the case of the LTP models developed, 
where one can use from one to all of the four-hundred 
parameters available. In fact, the RJMCMC algorithm 
is a special case of Markov Chain Monte Carlo (MCMC) 
methods that is capable of sampling the parameter space 
and at the same time jumping between models with dif- 
ferent dimensionality. The MCMC method implemented 
in the LTPDA toolbox [4] is based on the Metropolis al- 
gorithm and we can say that the RJMCMC method de- 
veloped is a generalized form of the same algorithm. 

At this point it would be convenient to describe the 
philosophy of the Bayesian framework and the Metropo- 
lis algorithm and then move to the general case of the 
RJMCMC algorithm. The Bayes rule can be summa- 



A. Calculating the Bayes Factor 

The evidence of a hypothesis X, 7rx(y), states the sup- 
port for this hypothesis given the observations, or in 
other words, how much the data favors our model. In 
our case, the hypothesis is the model we implement to 
describe the LTP system. Consequently, given two dif- 
ferent models X and Y, the Bayes Factor Bxy is a com- 
parison between the evidences of model X and model Y 
given by their ratio 
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where 



My) = J AO,K\y)de, K = X,Y. 



(3) 



(4) 



This integral is extremely difficult and costly to evalu- 
ate, specially when the model becomes complicated with 
higher dimensionality. In the next sections some esti- 
mators of the Bayes Factor are described. Most of them 
evaluate directly the Bayes factor, while other techniques 
are used to estimate the evidence for each model. All of 



them are used to investigate our LPF models and mis- 
sion experiments. In the end, we can draw conclusions 
about the competing models given the estimated value of 
the Bayes Factor. If Bxy < 1, the evidence is negative 
and the observations support model Y. If Bxy > 1, the 
evidence is positive and model X is more probable than 
model Y. 



B. The Reversible Jump Markov Chain Monte 
Carlo Algorithm 

The RJMCMC algorithm is a robust and efficient tool 
to estimate the Bayes Factor. It can be shown [13|, [l5| 
that after a large number of iterations it will converge to 
the true value of the Bayes Factor. The only drawback is 
the computational cost of the algorithm. When more 
than three models are being compared, meaning that 
many transdimensional moves have to be performed, a 
considerable amount of time is required for convergence. 
The algorithm implemented in this work is a special case 
of the Metropolized Carlin and Chib method [ij . More 
precisely, let us suppose that we have a total number 
K of models to compare given a data set y. Then, the 
recipe for our RJMCMC method can be summarized in 
the following steps: 

1. Initialization: Choose an initial model k and the cor- 
responding parameters Ok. 

2. Apply the Metropolis algorithm for model k. This step 
is also called the "in model step" . 

3. Generate new Ok' from a multivariate Gaussian PDF 
and a random number u ~ /7[0,1]. This is the step 
where we propose a new model /c^ 

4. Calculate the acceptance ratio a^: 



^{y\0k')p{0k')g{uk') 
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where g{u) is the proposal distribution from where the 
random numbers u are drawn and IJI is the Jacobian: 
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(6) 



5. IfuKa'we accept the new model k' with parameters 
Ok' and set Ok = Ok' • 

6. Iterate from step 2 until convergence. 

The new set of parameters is connected to the old one by 
a deterministic function Ok = q{Ok'^u) (and of course 
Ok' = Q'{Ok^u')). We use independent proposals, so 
Ok = Q{Ok'^u) = u and Ok' = q'{Ok^u') = u' ^ thus, the 
Jacobian term in equation (J5j) is unity. The algorithm 
spends most of the time iterating "inside" the model that 
best describes the data. The RJMCMC method auto- 
penalizes higher-dimensional models, also by taking into 
account the priors p{Ok) of each model k. They serve as 



an Occam Factor integrated within the algorithm. Con- 
vergence is achieved if two main conditions are satisfied: 
First, the condition of reversibility which is stated in a 
simple way: The proposal function must be invertible, 
meaning that we can jump from the proposed param- 
eters back to the current parameters. And second, we 
must satisfy the dimension matching condition which in 
our case is always true since we use independent propos- 
als in the acceptance ratio. After convergence has been 
achieved, a good approximation to the Bayes Factor is 
given by: 



Bxy 



# of iterations in model X 

# of iterations in model Y 



(7) 



C. other approximations 

While the RJMCMC method directly estimates the 
Bayes Factor, the other methods implemented in our 
work make an approximation to the evidence of each 
model. They have been developed mainly as a cross- 
check for the RJMCMC method, but they also provide 
certain freedom of choice, depending on the nature of the 
problem, as well as the data available. 

The first approximations we consider are the Laplace 
approximations. The Laplace approximations perform 
a comparison between the volume of the models in the 
parameter space and the volume of the uncertainty ellip- 
soid of the parameters [9|. This comparison is feasible 
if we assume that we work at high Signal-to- Noise Ratio 
(SNR) and therefore the posterior PDF is Gaussian near 
the maximum likelihood parameters, ^map- Another re- 
quirement is that a large number of samples must have 
been collected. Then, the evidence in Eq. (j4]) becomes: 
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where I^x is the number of dimensions of model X and 
H is the Hessian matrix of the model. There are two 
main variations of the Laplace approximation. In the 
first one we make use of the Fisher Information Matrix F, 
calculated at ^map , as an approximation to the expected 
covariance matrix [16|, [l^- Then, the evidence of the 
model becomes: 



7rx(y) - (27r) 
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(9) 



Of course, the main limitations of this method are asso- 
ciated with the confidence we have on the calculation of 
the Fisher Matrix. Furthermore, as expected, the results 
appear to be poorer in comparison with the other meth- 
ods as we move towards lower SNR areas. We can follow 
the notation of [16] and call this particular approxima- 
tion the Laplace-Fisher (LF) approximation. Another 
well-known variation is the Laplace-Metropolis (LM) es- 
timator of the marginal likelihood [9]. In this case, we use 
all necessary components for the calculation of the evi- 
dence from previous MCMC estimates. The parameters 



^MAP are extracted from the chains of a MCMC parame- 
ter estimation run for the particular model, while we use 
the weighted covariance matrix of the chains T, , using a 
Minimum Volume Ellipsoid (MVE) or a Minimum Co- 
variance Determinant estimator (MCD) [18]. The MVE 
method has been implemented and integrated in the LT- 
PDA toolbox. In this case, the evidence of model X can 
be written as 



7rx(y) - (27r) 
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^(^MAP,X|y). (10) 



The LM method is considered to be a very reliable tool 
for the computation of the evidence of a model |9] . The 
third method used is the Schwarz-Bayes Information Cri- 
terion (SBIC) and is based on the following: After the 
assumption that the priors for each model follow a mul- 
tivariate Gaussian PDF, the Schwarz criterion is defined 
as: 



S ^ ln(^(^MAP,X |y)) - ln(^(^MAP, Y |y)) 
-l/2{Dx-DY)Hn), 



(11) 



where I^x and I^y are the dimensions of each model and 
n is the number of samples in the data. It can be proven 
that if n ^ oo, then S -^ Bxy and the Schwarz criterion 
can be a good approximation to the Bayes factor. In fact, 
n must be chosen carefully so that n = A/eff, where A'eff is 
the number of effective samples in the data that represent 
the growth of the Hessian matrix of the log-likelihood [9] . 



D. Implementation 

After the introduction of the comparison criteria and 
techniques, we must define the particular system transfer 
function and the mathematical expression of the likeli- 
hood. If the system's transfer function is iis{^) and we 
consider known injection signals, x, then the output of 
the system, y, can be expressed as: 



y = H,(l9)x + n, 



(12) 



where n is the overall instrument noise and h{0) = 
H5(^)x is the response or template of the system. While 
the noise in the LTP experiment may have a dependence 
on the parameters, it can be approximated with Gaussian 
noise since we work in a high SNR regime and this de- 
pendence on the system parameters is negligible. On the 
other hand, the likelihood for a model with parameters 
can be written as 



where Sn{f) is the power spectral density of the noise. 
For the proposal distribution g{u) appearing in the ac- 
ceptance ratio of the Metropolis-Hastings algorithm a 
multivariate Gaussian distribution is used. Although the 
choice of the proposal distribution should not affect the 
final estimated parameter PDFs, it greatly affects the 
convergence speed of the chains to the target posterior 
distribution. The covariance of the multivariate Gaussian 
is calculated beforehand or during the search phase us- 
ing the Fisher Information Matrix (FIM). Each element 
of the FIM is defined as 



' d\og{7T{y\0)) \ 




, (15) 



and is calculated at a set of numerical values 6o of the 
parameters. The Cramer- Rao Bound (CRB), as given by 
the components of the inverse of the FIM, provides an es- 
timation of the expected errors in the parameters [I3 • In 
the case of the RJMCMC method, the FIM is calculated 
once for each model before the execution of the algorithm. 
It is computed numerically or analytically depending on 
the model. The more complete versions of the LTP mod- 
els are coded in State-Space (State Space Models) format 
and their differentiation is performed numerically. More 
details on this can be found in |2Q|-[22|. 

The first step in the analysis is to apply a Fourier trans- 
formation to the data and estimate the power spectral 
density of the noise n, i.e. Snif)- As described in [j], 
the nature of the LTP experiment allows us to combine 
all the available information from different investigations. 
That means that we can combine the posterior distribu- 
tion for each data set for each model. Then, we can apply 
MCMC analysis strategies to sample the posteriors and 
make use of any of the criteria described in section III CI 
or sample the joint parameter spaces with the RJMCMC 
method that was described in section IIIBI Technically, 
since the RJMCMC algorithm can be seen as a general- 
ized MCMC method, it performs a parameter search by 
simultaneously sampling the likelihood and consequently 
it maps the joint posterior distribution and assigns PDFs 
to the parameters. The extra point is that we can esti- 
mate directly the ratio of the evidences of the models and 
test the hypothesis made for the data. 

To illustrate all these methods we have applied them 
to a toy model in Appendix [A] 



III. MODELING THE LTP EXPERIMENT 



7r{y\e) = Ce^p[- 
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(13) 



where the angular brackets denote the noise weighted 
inner product [i9| 



<a\b>=2 ~a*{f)b{f)+a{f)b*{f) /SM), (14) 
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As a technology demonstrator of a space-based 
gravitational-wave observatory, the LPF mission will 
place two 2 kg Test Masses (TMi and TM2) in free fah. 
The goal is to estimate the residual differential accelera- 
tion between TMi and TM2 \u\. To minimize all external 
forces acting on TMi along the x-axis, a drag-free control 
loop [23] has been designed. The coordinates of TM2 are 
controlled via electrostatic actuators and the Spacecraft 
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FIG. 1: Schematics of a simplified LTP model. It is composed by smaller SSMs of the various subsystems, each one a standalone 
SSM, here represented with white boxes. The rhombi represent the main injection inputs to the system. We mainly use the first 
injection port, which represents the interferometric inputs to the system, while the second rhombus stands for the capacitance 
actuators injection ports. The symbol ft represents noise contributions from various sources and finally, the parameters to fit, 
for the sake of convenience, are located in the gray boxes inside of the respective SSMs. See the text for more details. 



(SC) is controlled by micro-Newton thrusters. More 
specifically, the main components of the LPF mission are: 

• The Gravitational Reference Sensor (GRS) [2^. It con- 
sists of the TMs and the vacuum chamber around them. 
It is mounted with electrostatic actuators to control 
the 6 degrees of freedom of each TM, and also to apply 
forces and torques to keep the TMs in free fall. 

• The Optical Metrology System [25] consists of the op- 
tical bench, its subsystems and the processing com- 
puter. It performs the sensitive optical measurements 
of the positions of the TMs along the x-axis. For the 
simplified one-dimensional model version, we consider 
two interferometric inputs i and two outputs o of the 
system. The measured distances are xi and Xi2, the 
distance of TMi to the SC and the distance between 
TMi and TM2 respectively. The inputs ports are used 
to inject guidance signals. 

• The propulsion system for LPF was originally com- 
posed by 12 Field Emission Electric Propulsion 
(FEEP) thrusters installed on the SC. The main func- 
tion of the propulsion system is to maintain the refer- 
ence TMs in free fall conditions. The current design 
however is based on Cold Gas [2[ . In this work we con- 
sider the FEEPs model, since the Cold Gas SSM is not 
yet available, but in any case, it is safe to assume that 
the data analysis strategies will not be affected by this 
change. 



• Finally, the Drag-Free and Attitude Control Sys- 
tem (DFACS) calculates all forces and torques acting 
on the SC and TMs and computes the commanded 
forces/response of the system in order to maintain the 
nominal free fall of TMi. 

The LTP is a complicated instrument composed by a 
plethora of subsystems and coupled control loops. Dur- 
ing the mission, there will be dedicated experiments with 
the aim of characterizing the different noise contributions 
and couplings. In the following section, the LTP mod- 
els implemented for system identification experiments are 
described. We present simulations of the planned sys- 
tem identification experiments and use all techniques de- 
scribed to perform model selection. 



A. The LTP system dynamics 

For the purposes of this work we consider a simplified 
version of the LTP operating in the so-called main sci- 
ence mode and we confine the system dynamics to the 
one-dimensional case. At this point, given the nature 
of the LTP, it is very convenient to represent the model 
as a closed loop system, due to closed-loop dynamics and 
controllers of the instrument. In Fig. (TJ there is a simpli- 
fied schematic representation of the LTP SSM. The white 
boxes represent standalone SSMs, all together assembled 
to form the final LTP experiment [20]. Each submod- 



ule has injection u{t) and output y{t) ports and can be 
represented as: 



x{t) = Ax x(t) -^ B X u{t), 
yit) = C X x{t) ^Dx u{t), 



(16) I 



where x{t) are the states of the system, A is the state 
matrix, B is the input matrix, C is the output matrix, 
and D is the feed-through matrix [20] . This form of im- 
plementation has numerous advantages, hke modularity 
and flexibility in modeling the LPF. For example, the 
user is able to combine any given SSM module and use a 
custom noise model for each particular subsystem. 

For the purposes of this paper, we simulate the exper- 
iments as explained in [20|, U^ 123 • Then, the complete 
system is controlled by optical readouts that measure the 
positions of the Test Masses, xi and Xi2, where the x- 
axis is defined by the line joining them. In order to per- 
form parameter estimation we inject sinusoidal signals 
to each interferometric channel alternately and measure 
the response of the system. In fact, these inputs can be 
described as "fake" displacements of the TMs. For this 
type of experiments, we can define the following time- 
series arranged as vectors: 



, (17) 



where o is the measured signals vector, i is the injection 
signals vector and n is the overall noise for both channels. 
The noise contributions of each LTP subsystem are de- 
noted in Fig. [U as ft. It can be instrumental or read-out 
noise, or it may originate from external sources (like solar 
radiation). A typical response of a system with the de- 
fault values for each parameter as a function of frequency 
can be seen in Fig. [2l 

The LTPDA team has already implemented a com- 
plex three-dimensional LTP model. This 3D model 
contains hundreds of parameters, but for the sake of 
simplicity, we use the one-dimensional version and a 
subset of the available parameters to estimate. The 
set of parameters that we recover from the system is 
{co'i,co'2, Ati, At2, Adf, Asus,^2i}- ^1 and 002 stand for the 
electrostatic stiffness of the TMs to the surroundings, Ati 
and At 2 are time delays, A^f and Ag^s the capacitance 
and thruster gains respectively, and S21 denotes the cross- 
coupling between the two interferometric channels. 



IV. TESTS WITH SIMULATED DATASETS 



As described in [26|, |27| , during the planned mission ex- 
periments, sinusoidal signals of different frequencies are 
going to be injected to the control loop, simulating dis- 
placements of the test masses. Besides the LTP dynamics 
system, the DA team has modeled noise sources for the 
various subsystems of the LPF mission based on theoret- 
ical predictions or characterization of each element from 
on-ground test campaigns. The main noise contributions 
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FIG. 2: Bode plot of the LTP transfer functions using the de- 
fault values for the parameters for the case of the experiments 
described by the signals of Eq. ()17p. 



come from the FEEPs, the interferometric readouts, and 
the capacitance actuators [28| . We can simulate synthetic 
noise for any particular experiment and perform parame- 
ter estimation exercises in order to test the system iden- 
tification algorithm's readiness. A MCMC search of the 
parameter space for such a simulated run returns satis- 
factory results as shown in Table [H 



TABLE I: MCMC parameter estimation results from simu- 
lated experiments. 



Parameter Real value 



Estimated ±<7 



UJl 


1.3 X 10-^ 


(-1.2999 zb 0.0002) X 10"^ 


0J2 


1.9 X 10-6 


(-1.8999 ± 0.0002) x 10"^ 


Adf 


0.82 


0.8201 =b 0.00025 


^sus 


1.08 


1.080004 ± 3 X 10"^ 


Ati 


0.2 


0.20020 ± 5 X 10"^ 


A^2 


0.2 


0.2003 ± 2 X 10"^ 


^21 


0.0004 


0.0004001 ± 1 X 10"^ 



Following the same logic as in the examples with toy 
models in Appendix [Aj synthetic noise can be generated 
and the response of a known system can be simulated. 
For the following cases studies, the LTP default parame- 
ters were always set to the same values given in Table [H 
It should be noted that there is no reason to have any 
strong prior belief about the true values of the parame- 
ters. Therefore, it is reasonable to assume uniform priors 
for all the tests. 



A. Guidance delay investigation 



In order to demonstrate a first application to the LTP, 
we can investigate a model selection case that was first 
encountered during a data analysis exercise [26] . The aim 
was to test the parameter estimation algorithms with a 
dataset generated with the LTP simulator used in Op- 
erational Exercises ^], where the "true" values of the 
parameters of the system were totally unknown. In this 
situation, the DA team was able to perform parameter 
estimation, but it was noted that the fit was improved 
when two extra parameters were introduced. These pa- 
rameters are the guidance delays, Ati and At 2. They 
are simply time delays to the application of the guidance 
signals (see Fig. [1]), due to operations of the Data Man- 
agement Unit (DMU), which is the on-board computer 
controlling the LTP experiment. The final robustness of 
the fit indicates that the new parameters substantially 
improve the model. 

This problem can be better studied by reducing it to a 
model selection problem and can be tested with synthetic 
data-sets. The simulated data source is a LPF system 
with the default characteristics given in Table |I] with the 
exception of the delays, which were set to: Ati = At2 = 
0. With this configuration, we assume a true system 
where the application signals are applied instantaneously. 
For our first tryout we injected "fake" inter ferometric 
displacements, while for the second investigation we used 
the same structure of injections but with much lower SNR 
(~5). 

In order to determine the importance of the two extra 
parameters, we have to verify which model describes bet- 
ter the particular data-set: The seven parameter model X 
with parameters Ox = {a;i,a;2, Ati, At2, Adf, Asus,^2i}, 
or the five parameter model Y with parameters Oy = 
{cji, 60^2, ^df , ^sus, ^21 }• While both X and Y models are 
capable of explaining the observations, we expect the sim- 
pler model Y to be more favorable from a RJMCMC 
output, since the extra parameters Ati and At2 are not 
significant for the data. The evolution of the Bayes factor 
for such an investigation, is shown in Fig. [3l 



TABLE II: Results for the Guidance delay investigation with 
the second low SNR experiment of the guidance delay inves- 
tigation. See text for details. 



Method 


BxY 


RJMCMC 


0.309 


LF 


0.124 


LM 


0.078 


SBIC 


0.768 
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FIG. 3: First 3 x 10^ iteration of the RJMCMC output when 
comparing a seven and a five- dimensional LTP model (models 
X and Y respectively). Since the models are not competitive 
when the SNR = 60, the blue line tends asymptotically to 



RJMCMC method, there is no single iteration "inside" 
the more complex model. This changes dramatically de- 
pending on the nature of the problem and, of course, as 
we show in the following section, on the SNR. In Table HIl 
we present only the second low- SNR experiment, for the 
sake of comparison between the methods. Each method 
seems to favor the simpler model but they are not in 
total agreement between them. This is to be expected, 
as the SNR of this investigation is very low and the ap- 
proximations of the evidence become more sensitive. For 
this particular case of the injections, the models are not 
competitive and therefore, the resulting estimated Bayes 
Factor is extremely small. For a direct comparison, the 
RJMCMC algorithm requires more than 10^-10^ itera- 
tions. 



B. Investigation on the interferometric 
cross-couplings 



The results we obtain for all the approximations verify 
that the simpler model Y is much more probable than 
the more complicated model X. For the particular exper- 
iments proposed in [27] where the SNR is high, the Bayes 
factor computed tends to zero. In fact, for the case of the 



The motivation to include this case arose from recent 
experience from operational exercises that took place in 
June and November of 2012. The aim of those exercises 
was to mimic real time operations and train the scientific 
team in terms of coordination over data analysis proce- 



dures. One of the conclusions gathered from such exer- 
cises is that there is going to be a thorough study on the 
instrument by examining the cross-couphngs between the 
readouts of ah the degrees of freedom for the test masses 
and the spacecraft. Sticking to the one-dimensional case, 
it is possible to examine interferometric cross-couplings 
between the injection channels for experiments similar to 
the ones in the previous section. The final aim is to ex- 
plore whether we are able to discover cross-couplings by 
computing the Bayes Factor between simpler and more 
complex models. 

The sensor matrix S in Fig. [1] contains calibration fac- 
tors of the interferometric readouts. The off-diagonal 
terms represent the possible cross-talks contributions be- 
tween the two measurements. In a simplified analytical 
form, and for the particular set of experiments that are 
defined with two injections and two readout channels (as 
in Eq. (fT7|) ). the sensor matrix can be written as: 



1 

^21 1 



(18) 



The cross- term under investigation is the component ^21 
in Eq. (p^ . 

The LTPDA SSM simulator is again used to produce 
data for a double investigation. The first data-set is 
generated with a LTP that is is equipped with a "per- 
fect" interferometer ((^21 = 0), while for the second there 
is a cross-coupling between the two channels given by 
S21 = 10~^. For both investigations we compare two 
models X and Y with 6x = {^i,^2,^df,^sus, ^21} and 
^Y = {<^i,^2,^df,^sus} respectively, or in other words, 
for both cases we try to recover the model with the di- 
mensionality that supports the observations. The injec- 
tion signals this time are in the form of single frequency 
sinusoidals with / = 0.01 Hz. This choice of simplified in- 
jections is justified by the fact that our aim is to demon- 
strate the performance of the algorithms and methods, 
and a multi-frequency signal yields to extremely large 
numerical values of the Bayes Factor, making the com- 
parison between the methods practically impossible. 



TABLE III: Results for the two experiments of the interfero- 
metric cross-coupling investigation. For the first experiment 
the true value of the cross-coupling is S21 = 0, while for the 
second 621 = 10~^. 



Method 




BxY 




Experiment 1 


Experiment 2 


RJMCMC 


0.000368 


35.177 


LF 


0.000326 


35.399 


LM 


0.000326 


35.669 


SBIC 


0.000167 


37.105 



The results of this investigation are summarized in 
Table IIIII In both cases we were able to obtain con- 
clusive results for the cross-coupling of the system: for 
the first experiment, the Bayes Factor is below unity. 



thus favoring the simpler model Y which does not con- 
tain the parameter (^21, while for the second experiment 
the favored model is X, that can describe a LTP with 
S21 > 0. Indeed, this result shows possible application 
of this methodology during flight operations. The Bayes 
Factor could be used to quantitatively discriminate com- 
peting interpretations of the data by the science team. 



V. USING THE BAYES FACTOR FOR 
EXPERIMENT OPTIMIZATION 



In this last section we explore the capability of the im- 
plemented model selection framework, not only as a set 
of tools for data analysis purposes, but also as a way to 
evaluate the efficiency of the experiments we are plan- 
ning to run in the satellite. As we are going to show, 
by comparing different models under different input sig- 
nal conditions, we can safely determine the best range 
of parameters that define our experiments, or verify the 
injection frequencies that maximize the information ex- 
tracted from the system. 
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FIG. 4: Difference on the FEEP thrusters response model. 
Even with such small difference in the two LTP models, we 
are able to select clearly the correct model if we inject a multi- 
frequency signal with high SNR. 
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FIG. 5: The Bayes Factor as a function of SNR computed 
using different methods. 



FIG. 6: The Bayes Factor as a function of the injection 
frequencies in the first interferometric channel. The com- 
puted evidence of model X is stronger when the sinusoidal 
injection signals have a frequencies around fi^_^ :^ 0.0006 
Hz and fi^ ^ 0.05 Hz respectively. 



A. The Bayes Factor as function of the SNR 



B. The Bayes Factor as function of the injection 
frequencies 



It has been shown [l6|, |29| that there is a dependence 
of the Bayes Factor output on the SNR regime of the 
investigations. This, of course, holds true in the case of 
the LTP as it can be seen in Fig. [5l This figure was 
created by simulating LTP experiments for each value of 
the SNR, while the injection signals were single frequency 
(/ = O.OlHz) sinusoidal inputs into the system. The LTP 
models under comparison were quite similar with the ex- 
ception of a different realization of the response model 
of the FEEP thrusters (see Figures [1] and 2]) . It is clear 
that above the critical value of the SNR = 21 the results 
obtained with the different techniques are consistent and 
in good agreement. Below that value of the SNR we can- 
not make clear decisions about the competing models, 
as the wrong model is showing preference, or we poorly 
approximate the Bayes Factor. 

Although this SNR limit varies, as expected, depend- 
ing on the type of the investigation and the models, the 
current result is already providing an estimation of the 
required power of the injection signals that we need to 
consider in the LTP experiment. This information and 
the method used here will be of particular interest for the 
design of in-flight experiments for the LPF mission. 



Furthermore, for system identification experiments, as 
in the case of the LTP, the computed evidence of a model 
depends on the design of the experiment itself. The infor- 
mation obtained from the system differs depending on the 
injection frequencies. An interesting study is to explore 
in detail this relation. To keep things simple, the models 
from section IIVBI can be put to use. A four- (X) and a 
five-dimensional (Y) models are examined, given differ- 
ent injection frequencies. More precisely, since the differ- 
ence between the models is the cross-coupling 621^ which 
describes the signal leakage from the first to the differen- 
tial interferometric channel, we examine the Bayes Factor 
given different injection frequencies to the first channel, 
while keeping constant the injection to the differential 
channel {fi^^2 — O-^ Hz). The SNR of this experiment is 
kept at the "low" value of 28. 

The data generation model is again mounted with a 
"perfect" interferometer {S21 = 0) and model X is the 
same as the one used to produce the data, while model 
Y is the one with the extra parameter S21. The expected 
outcome of this exploration is that if the system is more 
sensitive to the J21 parameter at some particular frequen- 
cies, we must detect an increase in the Bayes Factor. 

In Fig. [6] we can see the corresponding Bayes Factor 



versus the injected frequencies to the first channel. Given 
the low SNR of the investigation, while model X should 
be more favorable, a preference for the more complex 
model Y is shown for a certain set of frequencies. This re- 
sult is a clear indicator of the set of preferred frequencies 
that can be injected to the system for its characterization 
given the current configuration and SNR. Indeed, injec- 
tions around / c^ 0.0006 Hz and / c:^ 0.05 Hz promote 
the identification of the correct model, while injections 
at both the high and low frequency limit, together with 
/ c^ 0.01 Hz, may induce the analysis into an error. This 
frequency dependence must be associated to the sensitiv- 
ity of the experiment to a given parameter, the parame- 
ter ^21 in this particular case. This observed dependency, 
when considering a more realistic model, will be of par- 
ticular interest in the selection of injection signals for the 
experiments to be run in-flight. 



VI. CONCLUSIONS 
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part of the LPF data analysis. To verify the newly 
created RJMCMC method, the Laplace-Metropolis, the 
Laplace-Fisher and the Bayes Information Criterion 
methods were also added to the LTPDA toolbox. The 
RJMCMC algorithm is by far the most computationally 
costly, but at the same time it is the more suitable one 
when we compare more than two models or we work 
with inputs with low SNR. For the cases where the 
dimensionality is high (dim > 8), more than a total of 
10^ samples are required. The Laplace-Metropolis and 
the Laplace-Fisher methods are reliable when we work 
in the high SNR regime, but they also require significant 
computing time, specially when one has to use outlier 
detection methods to estimate the weighted covariance 
matrix. Moreover, the Laplace-Fisher approximation 
is limited by the use of the Fisher Information Matrix, 
which for the case of LTP state space models is com- 
puted numerically. The Bayes Information Criterion is 
computationally cheap but very sensitive to lower SNRs. 



We have implemented three different methods to com- 
pare competing models of the LTP experiment on-board 
the LPF mission: The RJMCMC algorithm, the Laplace 
approximations, and the Bayes Information Criterion. 
First, we have tested the methods with simple harmonic 
oscillator models (see Appendix [A]) and later, they have 
been applied to more complicated LTP models. These 
investigations have been conducted using synthetic data 
produced by the SSM simulator. 

In the case of the LTP, two different cases of model se- 
lection problems have been investigated. For the first 
case, we have considered five- and seven-dimensional 
models, where the importance of the extra two param- 
eters was examined. These two extra parameters are 
time delays caused by the LTP hardware and they can 
be characterized as essential parameters of the model. 
For the second case, the possibility of studying instru- 
mental cross-talks by the calculation of the Bayes Factor 
was explored. Using synthetic data, we have been able 
to distinguish whether there were interferometric cross- 
couphngs present in the system. 

The results from each method seem to agree, but the 
output strongly depends on the expected SNR and of 
course on the models under investigation. Consider- 
ing the LPF mission planned experiments, the SNR is 
high enough to safely use any of all the available tech- 
niques, but probably the most computationally demand- 
ing methods will be used for off-line analysis to confirm 
our first computations. 

An attempt to associate the output of the methods 
with the actual system identification experiment has been 
made. We have used different experiment setups to 
demonstrate that the Bayes Factor depends not only on 
the SNR, but also on the injection frequencies to the sys- 
tem. 

Finally, the RJMCMC algorithm employed in this 
work has been integrated in the LTPDA toolbox as 
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Appendix A: Tests with toy models 

In this appendix section, as an illustration of the meth- 
ods, a first application by comparing state-space har- 
monic oscillator models is presented. Such a system can 
be described as: 



y 



u 



-ijp' — 2iPuJouJ 



(Al) 



where y is the output, x the injection signal, n the read- 
out noise, /3 the damping coefficient, and uq is the natu- 
ral frequency of the system. A single frequency sinusoidal 
signal is being injected to the system and the output is 
simulated, given a fixed model and a Gaussian readout 
noise. Afterwards, we test analytical models with the 
expectation that the one that best describes the observa- 
tions is going to be favored. Thus, we expect to identify 
the model that we used to create the particular dataset. 
The true system values are: The mass m = 1 kg, the 
damping coefficient P = 0.1kg x s~^, and the spring con- 
stant /c = 0.1 kg X s~^. Then we can proceed to compare 
three different analytical models. For models A and B, we 
assume the mass to be a known parameter and equal to 1 
kg, while for model C, the mass is assumed to be equal to 
1,05 kg (wrong value). The unknown parameter for mod- 
els B and C is the spring constant k and for model A, both 
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k and /3 are assumed to be unknown. This information 
about the models is summarized in Table HVl In principle 



TABLE IV: Candidate harmonic oscillator models. 
Model Assumed parameters Parameters to fit 



A 
B 
C 



m = 1 kg 
= lkg, k = 0.01 kg X s-2 
1.05 kg, k = 0.01 kg X s-2 



k and (3 
k 
k 



we can compare as many models as we desire, but this, 
together with more complex models, greatly affects the 
convergence time required as expected. With the config- 
urations considered, we expect the algorithm to "favor" 
model A and B more than model C. The resulting output 
of the RJMCMC method can be viewed in Fig. [7] and 
it agrees with our expectations. The algorithm output 
was "positive" when comparing the simpler model B and 
the "erroneous" model C and when comparing models A 
and C, and it was "negative" when comparing the more 
complex model A against the simpler model B. The later 
supports better our data-set and we can claim that it is 
the most favored. It is an expected outcome, since the 
algorithm automatically applies "Occam's Razor" princi- 
ple to the higher-dimensional models. These results can 
be verified with the rest of the available methods imple- 
mented to approximate the Bayes factor (see Table |V|) . 



10,000 
1000 



2 100 
u 

(0 



■A/B 
■A/C 
■B/C 



" r. 



Possitive 
Negative 



0.1: 

0.01 







5 10 

Iterations 



15 



xlO 



FIG. 7: Output of the RJMCMC algorithm for the investiga- 
tion with the toy models of Table HVl Each line represents the 
evolution of the Bayes factor (comparing two models) against 
the number of iterations of the algorithm. The blue line is the 
BxY where X is model B and Y is model C. Using Eq. ([7j) 
and this plot we can say that the model B is approximately 
300 times more probable than model C. 



TABLE V: Bayes Factor estimates for the toy model experi- 
ment. The comparison is between models A and B of Table HVl 
for the same realization of synthetic noise that was used to 
produce Fig. [71 



Method 


BxY 


RJMCMC 


0.00189 


LF 


0.00183 


LM 


0.00177 


SBIC 


0.00198 
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